Learning from Labeled Features for Document Filtering
نویسندگان
چکیده
Existing document filtering systems learn user profiles based on user relevance feedback on documents. In some cases, users may have prior knowledge about what features are important. For example, a Spanish speaker may only want news written in Spanish, and thus a relevant document should contain the feature“Language: Spanish”; a researcher focusing on HIV knows an article with the medical subject “Subject: AIDS” is very likely to be relevant to him/her. Semi-structured documents with rich metadata are increasingly prevalent on the Internet. Motivated by the welladopted faceted search interface in e-commerce, we study the exploitation of user prior knowledge on faceted features for semi-structured document filtering. We envision two faceted feedback mechanisms, and propose a novel user profile learning algorithm that can incorporate user feedback on features. To evaluate the proposed work, we use two data sets from the TREC filtering track, and conduct a user study on Amazon Mechanical Turk. Our experiment results show that user feedback on faceted features is useful for filtering. The proposed user profile learning algorithm can effectively learn from user feedback on both documents and features, and performs better than several existing methods.
منابع مشابه
Comprehensive Analysis of Dense Point Cloud Filtering Algorithm for Eliminating Non-Ground Features
Point cloud and LiDAR Filtering is removing non-ground features from digital surface model (DSM) and reaching the bare earth and DTM extraction. Various methods have been proposed by different researchers to distinguish between ground and non- ground in points cloud and LiDAR data. Most fully automated methods have a common disadvantage, and they are only effective for a particular type of surf...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملRRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملDocument Filtering Boosted by Unlabeled Data
This paper describes three learning methods for document filtering that use unlabeled data. The proposed methods are based on a committee of the classifiers which are trained on a small set of labeled data and then augmented by a large number of unlabeled data. By taking advantage of unlabeled data, the effective number of labeled data needed is significantly reduced and the filtering accuracy ...
متن کاملA New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation
Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1412.8125 شماره
صفحات -
تاریخ انتشار 2014